분산 환경에서 Count-min Sketch를 이용한 Top-K 빈도 검색

홈 > 연구문헌 > 학술대회 프로시딩 > 한국정보과학회 학술대회 > 2015년 동계학술발표회

한글제목(Korean Title)	분산 환경에서 Count-min Sketch를 이용한 Top-K 빈도 검색
영문제목(English Title)	Finding Top-K Frequent Route with Distributed Count-Min Sketch
저자(Author)	파딜라 쿠르히나 푸트리 안성아 권준호 Fadhilah Kurnia Putri Seonga An Joono Kwon
원문수록처(Citation)	VOL 42 NO. 02 PP. 0052 ~ 0054 (2015. 12)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	Due to the huge volume of data size, the large-scale data anlysis is interesting issue nowadays. among those, analyzing traffic data is one of meaningful contribution since traffic congestion is serious problem. Based on this concern, by analyzing the frequency of road trip events, after this we can identify which route within a city has most frequenct trip. However a scalability problem would occur when we try to convert the traffic data into the analyzed frequency information due to the large size of traffic data. In this paper, we present Top-K query processing system based on a distributed Count-Min Sketch algorithm for taxi tip events. First, we analyze raw taxi trip events with Count-Min Sketch using sub-linear space, which demands less memory space than the nuuber of distinct elements. The query processing for frequent route is done with Spark SQL. Our approach is implemented on Apache Spark, the cluster computing framework. We validate our approach using real taxi trip events of New York City.
키워드(Keyword)
파일첨부	PDF 다운로드